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Method and compositions are provided for analyzing nucleic 
acid sequences based on repeated cycles of duplex extension 
along a single-stranded template. Preferably, such extension starts 
from a duplex formed between an initializing oligonucleotide 
an., the template. As illustrated in the figure, the initializing 
oli nucleotide is extended in an initial extension cycle by ligating 
an . . gonucleotide probe to its end to form an extended duplex. 
The extended duplex is then repeatedly extended by subsequent 
cycles of ligation. During each cycle, the identity of one or 
more nucleotides in the template is determined by a label on, 
or associated with, a successfully ligated oligonucleotide probe. 
The invention provides a method of sequencing nucleic acid 
which obviates electrophoretic separation of similarly sized DNA 
fragments, and which eliminates the difficulties associated with 
the detection and analysis of spatially overlapping bands of DNA 
fragments in a gel or like medium. The invention also obviates 
the need to generate DNA fragments from long single-stranded 
templates with a DNA polymerase. 




FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCT on the front pages of pamphlets publishing international 
applications under the PCT. 



ClBchojlov«ki« 



KyTgytttn 

Democratic People's Republic 
of Korea 

Republic of Korea 



Tajikiwn 

Trinidad and Tobago 

Ukraine 

Uganda 



WO 96/33205 



PCT/US96/05245 



DNA SEQUENCING BY PARALLEL 
OLIGONUCLEOTIDE EXTENSIONS 

Field of the Invention 

5 The invention relates generally to methods for determining the nucleotide sequence 

of a polynucleotide, and more particularly, to a method of identifying nucleotides in a 
template by stepwise extension of one or more primers by successive ligations of 
oligonucleotide blocks. 

10 Background 

Analysis of polynucleotides with currently available techniques provides a spectrum 
of information ranging from the confirmation that a test polynucleotide is the same or 
different than a standard or an isolated fragment to the express identification and ordering 
of each nucleoside of the test polynucleotide. Not only are such techniques crucial for 

IS understanding the function and control of genes and for applying many of the basic 

techniques of molecular biology, but they have also become increasingly important as tools 
in genomic analysis and a great many non-research applications, such as genetic 
identification, forensic analysis, genetic counselling, medical diagnostics, and the like. In 
these latter applications both techniques providing partial sequence information, such as 

20 fingerprinting and sequence comparisons, and techniques providing full sequence 

determination have been employed, e.g. Gibbs et al, Proc. Natl. Acad. Sci., 86: 1919-1923 
(1989); Gyllensten et al, Proc. Natl. Acad. Sci, 85: 7652-7656 (1988); Carrano et al. 
Genomics, 4:129-136 (1989); Caetano-Anolles et al, Mol. Gen. Genet., 235: 157-165 
(1992); Brenner and Livak, Proc. Natl. Acad. Sci., 86: 8902-8906 (1989); Green et al, 

25 PCR Methods and Applications, 1: 77-90 (1991); and Versalovic et al, Nucleic Acids 
Research, 19: 6823-6831 (1991). 

Native DNA consists of two linear polymers, or strands of nucleotides. Each strand 
is a chain of nucleosides linked by phosphodiester bonds. The two strands are held together 
in an antiparallel orientation by hydrogen bonds between complementary bases of the 

30 nucleotides of the two strands: deoxyadenosine (A) pairs with thymidine (T) and 
deoxyguanosine (G) pairs with deoxycytidine (C). 

Presently there are two basic approaches to DNA sequence determination: the 
dideoxy chain termination method, e.g. Sanger et al, Proc. Natl. Acad. Sci., 74: 5463-5467 
(1977); and the chemical degradation method, e.g. Maxam et al, Proc. Natl. Acad. Sci., 

35 74: 560-564 (1977). The chain termination method has been improved in several ways, and 
serves as the basts for all currently available automated DNA sequencing machines, e.g. 
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Sanger et al, J. Mol. Biol., 143: 161-178 (1980); Schreier et al, J. Mol. Biol., 129: 169- 
172 (1979); Smith et al, Nucleic Acids Research, 13: 2399-2412 (1985); Smith et al, 
Nature, 321: 674-679 (1987); Prober et al, Science, 238: 336-341 (1987); Section II, Meth. 
Enzymol., 155: 51-334 (1987); Church et al. Science, 240: 185-188 (1988); Hunkapiller et 
5 al, Science, 254: 59-67 (1991); Bevan et al, PCR Methods and Applications, 1: 222-228 
(1992). 

Both the chain termination and chemical degradation methods require the generation 
of one or more sets of labeled DNA fragments, each having a common origin and each 
terminating with a known base. The set or sets of fragments must then be separated by size 

10 to obtain sequence information. In both methods, the DNA fragments are separated by high 
resolution gel electrophoresis, which must have the capacity of distinguishing very large 
fragments differing in size by no more than a single nucleotide. Unfortunately, this step 
severely limits the size of the DNA chain that can be sequenced at one time. Sequencing 
using these techniques can reliably accommodate a DNA chain of up to about 400-450 

15 nucleotides, Bankier et al, Meth. Enzymol., 155: 51-93 (1987); and Hawkins et al, 
Electrophoresis, 13: 552-559 (1992). 

Several significant technical problems have seriously impeded the application of 
such techniques to the sequencing of long target polynucleotides, e.g. in excess of 500-600 
nucleotides, or to the sequencing of high volumes of many target polynucleotides. Such 

20 problems include i) the gel electrophoretic separation step which is labor intensive, is 
difficult to automate, and introduces an extra degree of variability in the analysis of data, 
e.g. band broadening due to temperature effects, compressions due to secondary structure in 
the DNA sequencing fragments, inhomogeneities in the separation gel, and the like; ii) 
nucleic acid polymerases whose properties, such as processivity, fidelity, rate of 

25 polymerization, rate of incorporation of chain terminators, and the like, are often sequence 
dependent; iii) detection and analysis of DNA sequencing fragments which are typically 
present in frool quantities in spacially overlapping bands in a gel; iv) lower signals because 
the labelling moiety is distributed over the many hundred spacially separated bands rather 
than being concentrated in a single homogeneous phase, and v) in the case of single-lane 

30 fluorescence detection, the availability of dyes with suitable emission and absorption 
properties, quantum yield, and spectral resolvability, e.g. Trainor, Anal. Biochem., 62: 
418^26 (1990); Connell et al, Biotechniques, 5: 342-348 (1987); Karger et al, Nucleic 
Acids Research, 19: 4955-4962 (1991); Fung et al, U.S. patent 4,855,225; and Nishikawa 
et al, Electrophoresis, 12: 623-631 (1991). 
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Another problem exists with current technology in the area of diagnostic 
sequencing. An ever widening array of disorders, susceptibilities to disorders, prognoses of 
disease conditions, and the like, have been correlated with the presence of particular DNA 
sequences, or the degree of variation (or mutation) in DNA sequences, at one or more 
5 genetic loci. Examples of such phenomena include human leukocyte antigen (HLA) typing, 
cystic fibrosis, tumor progression and heterogeneity, pS3 proto-oncogene mutations, ras 
proto-oncogene mutations, and the like, e.g. Gyllensten et al, PCR Methods and 
Applications, 1: 91-98 (1991); Santamaria et al, International application PCT/US92/01675; 
Tsui et al. International application PCT/CA90/00267; and the like. A difficulty in 

10 determining DNA sequences associated with such conditions to obtain diagnostic or 
prognostic information is the frequent presence of multiple subpopulations of DNA, e.g. 
allelic variants, multiple mutant forms, and the like. Distinguishing the presence and 
identity of multiple sequences with current sequencing technology is virtually impossible, 
without additional work to isolate and perhaps clone the separate species of DNA. 

IS A major advance in sequencing technology could be made if an alternative approach 

was available for sequencing DNA that did not required high resolution electrophoretic 
separations of DNA fragments, that generated signals more amenable to analysis, and that 
provided a means for readily analyzing DNA from heterozygous genetic loci. 

An objective of the invention is to provide such an alternative approach to presently 

20 available DNA sequencing technologies. 

Summary of the Invention 

The invention provides a method of nucleic acid sequence analysis based on 
repeated cycles of duplex extension along a single stranded template. Preferably, such 

25 extension starts from a duplex formed between an initializing oligonucleotide and the 
template. The initializing oligonucleotide is extended in an initial extension cycle by 
ligating an oligonucleotide probe to its end to form an extended duplex. The extended 
duplex is men repeatedly extended by subsequent cycles of ligation. During each cycle, the 
identity of one or more nucleotides in the template is determined by a label on, or 

30 associated with, a successfully ligated oligonucleotide probe. Preferably, the 

oligonucleotide probe has a blocking moiety, e.g. a chain-terminating nucleotide, in a 
terminal position so that only a single extension of the extended duplex takes place in a 
single cycle. The duplex is further extended in subsequent cycles by removing the blocking 
moiety and regenerating an extendable terminus. 
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In one aspect of the invention, a plurality of different initializing oligonucleotides is 
provided for separate samples of the template. Each initializing oligonucleotide forms a 
duplex with the template such that the end undergoing extension is one or more nucleotides 
out of register, or phase, with that of every other initializing oligonucleotide of the 

5 plurality. In other words, the starting nucleotide for extension is different by one or more 
nucleotides for each of the different initializing oligonucleotides. In this manner, after each 
cycle of extension with oligonucleotide probes of the same length, the same relative phase 
exists between the ends of the initializing oligonucleotides on the different templates. Thus, 
in a preferred embodiment, where, for example, i) the initializing oligonucleotides are out 

10 of phase by one nucleotide, ii) 9-mer oligonucleotide probes are used in the extension step, 
and iii) nine different initializing oligonucleotides are employed, nine template nucleotides 
will be identified simultaneously in each extension cycle. 

Brief Description of the Drawings 
IS Figure 1 diagrammaticaliy illustrates parallel extensions of multiple templates in 

accordance with the invention. 

Figure 2 diagrammaticaliy illustrates an embodiment of the invention employing 
acid-labile linkages. 

Figure 3A diagrammaticaliy illustrates an embodiment of the invention employing 
20 RNase H labile oligonucleotides with 3'->5' extensions. 

Figure 3B diagrammaticaliy illustrates an embodiment of the invention employing 
RNase H labile oligonucleotides with 5'->3' extensions. 

Figure 4 diagrammaticaliy illustrates an embodiment of the invention employing 
ligation followed by polymerase extension and cleavage. 

25 

Definitions 

As used herein "sequence determination," "determining a nucleotide sequence," 
"sequencing," and like terms, in reference to polynucleotides includes determination of 
partial as well as full sequence information of the polynucleotide. That is, the term includes 
30 sequence comparisons, fingerprinting, and like levels of information about a target 

polynucleotide, as well as the express identification and ordering of each nucleoside of the 
test polynucleotide. 

"Perfectly matched duplex" in reference to the protruding strands of probes and 
target polynucleotides means that the protruding strand from one forms a double stranded 
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structure with the other such that each nucleotide in the double stranded structure undergoes 
Watson-Crick basepairing with a nucleotide on the opposite strand. The term also 
comprehends the pairing of nucleoside analogs, such as deoxyinosine, nucleosides with 2- 
aminopurine bases, and the like, that may be employed to reduce the degeneracy of the 
5 probes. 

The term "oligonucleotide" as used herein includes linear oligomers of nucleosides 
or analogs thereof, including deoxyribonucleosides, ribonucleosides, and the like. Usually 
oligonucleotides range in size from a few monomeric units, e.g. 3-4, to several hundreds of 
monomeric units. Whenever an oligonucleotide is represented by a sequence of letters, 

10 such as "ATGCCTG," it will be understood that the nucleotides are in 5*->3* order from 
left to right and that "A" denotes deoxyadenosine, "C" denotes deoxycytidine, "G" denotes 
deoxyguanosine, and "T" denotes thymidine, unless otherwise noted. 

As used herein, "nucleoside'' includes the natural nucleosides, including 2'-deoxy 
and 2'-hydroxyl forms, e.g. as described in Romberg and Baker, DNA Replication, 2nd 

15 Ed. (Freeman, San Francisco, 1992). "Analogs" in reference to nucleosides includes 
synthetic nucleosides having modified base moieties and/or modified sugar moieties, e.g. 
described generally by Scheit, Nucleotide Analogs (John Wiley, New York, 1980). Such 
analogs include synthetic nucleosides designed to enhance binding properties, reduce 
degeneracy, increase specificity, and the like. 

20 As used herein, "ligation" means to form a covalent bond or linkage between the 

termini of two or more nucleic acids, e.g. oligonucleotides and/or polynucleotides, in a 
template-driven reaction. The nature of the bond or linkage may vary widely and the 
ligation may be carried out enzymatically or chemically. 

25 Detailed Description of the Invention 

The invention provides a method of sequencing nucleic acids which obviates 
electrophoretic separation of similarly sized DNA fragments, and which eliminates the 
difficulties associated with the detection and analysis of spacially overlapping bands of DNA 
fragments in a gel or like medium. The invention also obviates the need to generate DNA 

30 fragments from long single stranded templates with a DNA polymerase. 

The general scheme of one aspect of the invention is shown diagrammatical! y in 
Figure 1. As described more fully below, the invention is not meant to be limited by the 
particular features of this embodiment. Template (20) comprising a polynucleotide (50) of 
unknown sequence and binding region (40) is attached to solid phase support (10). 
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Preferably, for embodiments employing N-mer probes, the template is divided into N 
aliquots, and for each aliquot a different initializing oligonucleotide i k is provided that forms 
a perfectly matched duplex at a location in binding region (40) different from that of the 
other initializing oligonucleotides. That is, the initializing oligonucleotides i,-i N form a set 

5 of duplexes with the template in the binding region (40), such that the ends of the duplexes 
proximal to the unknown sequence are from 0 to N-l nucleotides from the start the 
unknown sequence. Thus, in the first cycle of ligations with N-mer probes, a terminal 
nucleotide (16) of probe (30) ligated to i, in Figure 1 will be complementary to the N-l 
nucleotide of binding region (40). Likewise, a terminal nucleotide (17) of probe (30) 

10 ligated to i 2 in Figure 1 will be complementary to the N-2 nucleotide of binding region (40); 
a terminal nucleotide (18) of probe (30) ligated to i 3 in Figure 1 will be complementary to 
the N-3 nucleotide of binding region (40), and so on. Finally, a terminal nucleotide (15) of 
probe (30) ligated to L. in will be complementary to the first nucleotide of unknown 
sequence (50). In the second cycle of ligations, a terminal nucleotide (19) of probe (31) 

15 will be complementary to the second nucleotide (19) of unknown sequence (50) in duplexes 
starting with initializing oligonucleotide i,. Likewise, terminal nucleotides of probes ligated 
to duplexes starting with initializing oligonucleotides i 2 , i 3 , i 4 , and so on, will be 
complementary to the third, fourth, and fifth nucleotides of unknown sequence (50). 

In the above embodiment, the oligonucleotide probes are labeled so that the identity 

20 of the nucleotide abutting the extended duplex can be determined from the label. 

Binding region (40) has a known sequence, but can vary greatly in length and 
composition. It must be sufficiently long to accommodate the hybridization of an 
initializing oligonucleotide. Different binding regions can be employed with either identical 
or different initializing oligonucleotides, but for convenience of preparation, it is preferable 

25 to provide identical binding regions and different initializing oligonucleotides. Thus, all 
the templates are prepared identically and then separated into aliquots for use with different 
initializing oligonucleotides. Preferably, the binding region should be long enough to 
accommodate a set of different initializing oligonucleotides, each hybridridizing to the 
template to produce a different starting point for subsequent ligations. Most preferably, the 

30 binding region is between about 20 to 50 nucleotides in length. 

Initializing oligonucleotides are selected to form highly stable duplexes with the 
binding region that remain intact during any washing steps of the extension cycles. This is 
conveniently achieved by selecting the length(s) of the initializing oligonucleotides to be 
considerably longer than that, or those, of the oligonucleotide probes and/or by selecting 
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them to be GC-rich. Initializing oligonucleotides may also be cross-linked to the template 
strand by a variety of techniques, e.g. Summerton et al, U.S. patent 4,123,610; or they 
may be comprised of nucleotide analogs that form duplexes of greater stability than their 
natural counterparts, e.g. peptide nucleic acids, Science, 254:1497-1500 (1991); Hanvey et 
5 al. Science, 258: 1481-1485 (1992); and PCT applications PCT/EP92/01219 and 
PCT/EP92/01220. 

Preferably, the length of the initializing oligonucleotide is from about 20 to 30 
nucleotides and its composition comprises a sufficient percentage of G's and C's to provide 
a duplex melting temperature that exceeds those of the oligonucleotide probes being 

10 employed by about 10-50°C. More preferably, the duplex melting temperature of the 
initializing oligonucleotide exceeds those of the oligonucleotide probes by about 20-50°C. 
The number, N, of distinct initializing oligonucleotides employed in a sequencing operation 
can vary from one, in the case where a single nucleotide is identified at each cycle, to a 
plurality whose size is limited only by the size of oligonucleotide probe that can be 

15 practically employed. Factors limiting the size of the oligonucleotide probe include the 
difficulty in preparing mixtures having sufficiently high concentrations of individual probes 
to drive hybridization reactions at a reasonable rate, the susceptibility of longer probes to 
forming secondary structures, reduction in sensitivity to single base mismatches, and the 
like. Preferably, N is in the range of from 1 to 16; more preferably, N is in the range of 

20 from 1 to 12; and most preferably, N is in the range of from 1 to 8. 

A wide variety of oligonucleotide probes can be used with the invention. 
Generally, the oligonucleotide probes should be capable of being ligated to an initializing 
oligonucleotide or extended duplex to generate the extended duplex of the next extension 
cycle; the ligation should be template-driven in that the probe should form a duplex with the 

25 template prior to ligation; the probe should possess a blocking moiety to prevent multiple 
probe ligations on the same template in a single extension cycle, the probe should be 
capable of being treated or modified to regenerate an extendable end after ligation, and the 
probe should possess a signaling moiety that permits the acquisition of sequence information 
relating to the template after a successful ligation. As described more fully below, 

30 depending on the embodiment, the extended duplex or initializing oligonucleotide may be 
extended in either the 5'-> 3* direction or the 3*->5* direction by oligonucleotide probes. 
Generally, the oligonucleotide probe need not form a perfectly matched duplex with the 
template, although such binding is usually preferred. In preferred embodiments in which a 
single nucleotide in the template is identified in each extension cycle, perfect base pairing is 
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only required for identifying that particular nucleotide. For example, in embodiments 
where the oligonucleotide probe is enzymatically ligated to an extended duplex, perfect base 
pairing— i.e. proper Watson-Crick base pairing— is required between the terminal nucleotide 
of the probe which is ligated and its complement in the template. Generally, in such 

S embodiments, the rest of the nucleotides of the probe serve as "spacers" that ensure the next 
ligation will take place at a predetermined site, or number of bases, along the template. 
That is, their pairing, or lack thereof, does not provide further sequence information. 
Likewise, in embodiments that rely on polymerase extension for base identification, the 
probe primarily serves as a spacer, so specific hybridization to the template is not critical, 

10 although it is desirable. 

Preferably, the oligonucleotide probes are applied to templates as mixtures 
comprising oligonucleotides of all possible sequences of a predetermined length. The 
complexity of such mixtures can be reduced by a number of methods, including using so- 
called degeneracy-reducing analogs, such as deoxyinosine and the like, e.g. as taught by 

15 Kong Thoo Lin et al, Nucleic Acids Research, 20: 5149-5152; U.S. patent 5,002,867; 
Nichols et al, Nature, 369: 492-493 (1994); or by separately applying multiple mixtures of 
oligonucleotide probes, e.g. four mixtures comprising four disjoint subsets of 
oligonucleotide sequences that taken together would comprise all possible sequences of the 
predetermined length. 

20 Initializing oligonucleotides and oligonucleotide probes of the invention are 

conveniently synthesized on an automated DNA synthesizer, e.g. an Applied Biosystems, 
Inc. (Foster City, California) model 392 or 394 DNA/RNA Synthesizer, using standard 
chemistries, such as phosphoramidite chemistry, e.g. disclosed in the following references: 
Beaucage and Iyer, Tetrahedron, 48: 2223-2311 (1992); Molko et al, U.S. patent 

25 4,980,460; Koster et al, U.S. patent 4,725,677; Caruthers et al, U.S. patents 4,415,732; 
4,458,066; and 4,973,679; and the like. Alternative chemistries, e.g. resulting in non- 
natural backbone groups, such as phosphorothioate, phosphoramidate, and the like, may 
also be employed provided that the resulting oligonucleotides are compatible with the 
ligation and other reagents of a particular embodiment. Mixtures of oligonucleotide probes 

30 are readily synthesized using well known techniques, e.g. as disclosed in Telenius et al, 
Genomics, 13: 718-725 (1992); Welsh et al, Nucleic Acids Research, 19: 5275-5279 
(1991); Grothues et al, Nucleic Acids Research, 21: 1321-1322 (1993); Hartley, European 
patent application 90304496.4; and the like. Generally, these techniques simply call for the 
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application of mixtures of the activated monomers to the growing oligonucleotide during the 
coupling steps where one desires to introduce the degeneracy. 

When conventional ligases are employed in the invention, as described more fully 
below, the 5' end of the probe may be phosphorylated in some embodiments. A 5' 
S monophosphate can be attached to an oligonucleotide either chemically or enzymatically 
with a kinase, e.g. Sambrook et al, Molecular Cloning: A Laboratory Manual, 2nd Edition 
(Cold Spring Harbor Laboratory, New York, 1989). Chemical phosphorylation is described 
by Horn and Urdea, Tetrahedron Lett., 27: 4705 (1986), and reagents for carrying out the 
disclosed protocols are commercially available, e.g. 5* Phosphate-ON™ from Clontech 

10 Laboratories (Palo Alto, California). Preferably, when required, oligonucleotide probes are 
chemically phosphorylated. 

The probes of the invention can be labeled in a variety of ways, including the direct 
or indirect attachment of fluorescent moieties, colorimetric moieties, and the like. Many 
comprehensive reviews of methodologies for labeling DNA and constructing DNA probes 

IS provide guidance applicable to constructing probes of the present invention. Such reviews 
include Matthews et al, Anal. Biochem., Vol 169, pgs. 1-25 (1988); Haugland, Handbook 
of Fluorescent Probes and Research Chemicals (Molecular Probes, Inc., Eugene, 1992); 
Keller and Manak, DNA Probes, 2nd Edition (Stockton Press, New York, 1993); and 
Eckstein, editor, Oligonucleotides and Analogues: A Practical Approach CRL Press, 

20 Oxford, 1991); and the like. Many more particular methodologies applicable to the 
invention are disclosed in the following sample of references: Fung et al, U.S. patent 
4,757,141; Hobbs, Jr., et al U.S. patent 5,151,507; Cruickshank, U.S. patent 5,091,519; 
(synthesis of functional ized oligonucleotides for attachment of reporter groups); Jablonski et 
al. Nucleic Acids Research, 14: 61 15-6128 (1986)(enzyme-oligonucleotide conjugates); and 

25 Urdea et al, U.S. patent 5, 124,246 (branched DNA). 

Preferably, the probes are labeled with one or more fluorescent dyes, e.g. as 
disclosed by Menchen et al, U.S. patent 5,188,934; Begot et al PCT application 
PCT/US90/05565. 

Guidance in selecting hybridization conditions for the application of oligonucleotide 
30 probes to templates can be found in numerous references, e.g. Wetrnur, Critical Reviews in 
Biochemistry and Molecular Biology, 26: 227-259 (1991); Dove and Davidson, J. Mol. 
Biol. 5: 467-478 (1962); Hutton, Nucleic Acids Research, 10: 3537-3555 (1977); Breslauer 
et al, Proc. Natl. Acad. Sci. 83: 3746-3750 (1986); Innis et al, editors, PCR Protocols 
(Academic Press, New York, 1990); and the like. 
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Generally, when an oligonucleotide probe anneals to a template in juxtaposition to 
an end of the extended duplex, the duplex and probe are ligated, i.e. are caused to be 
covalently linked to one another. Ligation can be accomplished either enzymatically or 
chemically. Chemical ligation methods are well known in the art, e.g. Ferris et al, 

5 Nucleosides & Nucleotides, 8: 407-414 (1989); Shabarova et al, Nucleic Acids Research, 
19: 4247-4251 (1991); and the like. Preferably, enzymatic ligation is carried out using a 
ligase in a standard protocol. Many ligases are known and are suitable for use in the 
invention, e.g. Lehman, Science, 186: 790-797 (1974); Engler et al, DNA Ligases, pages 
3-30 in Boyer, editor, The Enzymes, Vol. 15B (Academic Press, New York, 1982); and the 

10 like. Preferred ligases include T4 DNA ligase, T7 DNA ligase, E. coli DNA ligase, Taq 
ligase, Pfu ligase, and Tth ligase. Protocols for their use are well known, e.g. Sambrook et 
al (cited above); Barany, PCR Methods and Applications, 1: 5-16 (1991); Marsh et al, 
Strategies, 5: 73-76 (1992); and the like. Generally, ligases require that a 5' phosphate 
group be present for ligation to the 3' nydroxyl of an abutting strand. 

15 

Preparing Target Polynucleotides 

Preferably, a target polynucleotide is conjugated to a binding region to form a 
template, and the template is attached to a solid phase support, such as a magnetic particle, 
polymeric microsphere, filter material, or the like, which permits the sequential application 

20 of reagents without complicated and time-consuming purification steps. The length of the 
target polynucleotide can vary widely; however, for convenience of preparation, lengths 
employed in conventional sequencing are preferred. For example, lengths in the range of a 
few hundred basepairs, 200-300, to 1 to 2 kilobase pairs are preferred. 

The target polynucleotides can be prepared by various conventional methods. For 

25 example, target polynucleotides can be prepared as inserts of any of the conventional 

cloning vectors, including those used in conventional DNA sequencing. Extensive guidance 
for selecting and using appropriate cloning vectors is found in Sambrook et al, Molecular 
Cloning: A Laboratory Manual, Second Edition (Cold Spring Harbor Laboratory, New 
York, 1989), and like references. Sambrook et al and Innis et al, editors, PCR Protocols 

30 (Academic Press, New York, 1990) also provide guidance for using polymerase chain 
reactions to prepare target polynucleotides. Preferably, cloned or PCR-amplified target 
polynucleotides are prepared which permit attachment to magnetic beads, or other solid 
supports, for ease of separating the target polynucleotide from other reagents used in the 
method. Protocols for such preparative techniques are described fully in Wahlberg et al, 
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Electrophoresis, 13: 547-551 (1992); Tong et al, Anal. Chem., 64: 2672-2677 (1992); 
Hultman et al, Nucleic Acids Research, 17: 4937-4946 (1989); Hultman et al, 
Biotechniques, 10: 84-93 (1991); Syvanen et al, Nucleic Acids Research, 16: 11327-11338 
(1988); Dattagupta et al, U.S. patent 4,734,363; Uhlen, PCT application PCT/GB89/00304; 
5 and like references. Kits are also commercially available for practicing such methods, e.g. 
Dynabeads™ template preparation kit from Dynal AS. (Oslo, Norway). 

Generally, the size and shape of a microparticle or beads employed in the method of 
the invention is not critical; however, microparticles in the size range of a few, e.g. 1-2, to 
several hundred, e.g. 200-1000 m diameter are preferable, as they minimize reagent and 
10 sample usage while permitting the generation of readily detectable signals, e.g. from 
fluorescently labeled probes. 

Schemes for Ligatine. Capping, and Regenerating Extendable Termini 

In one aspect, the invention calls for repeated steps of ligating and identifying of 

15 oligonucleotide probes. However, since the ligation of multiple probes to the same 
extended duplex in the same step would usually introduce identification problems, it is 
useful to prevent multiple extensions and to regenerate extendable termini. Moreover, if the 
ligation step is not 100% efficient, it would be desirable to cap extended duplexes that fail 
to undergo ligation so that they do not participate in any further ligation steps. That is, a 

20 capping step preferably occurs after a ligation step, by analogy with other synthetic 
chemical processes, such as polynucleotide synthesis, e.g. Andrus et al, U.S. patent 
4.816,571. This would remove a potentially significant source of noise from signals 
generated in subsequent identification steps. 

Below, several exemplary schemes for carrying out ligation, capping, regeneration, 

25 and identification steps in accordance with the invention are described. They are presented 
for purposes of guidance and are not meant to be limiting. 

A scheme for extending an initializing oligonucleotide or an extended duplex in the 
3'->5 k direction is illustrated in Figure 2. Template (20) is attached to solid phase support 
(10) by its 5' end. This can be conveniently accomplished via a biotin, or like linking 

30 moiety, using conventional techniques. Initializing oligonucleotide (200) having a 5' 

phosphate group is annealed to template (20) as described above prior to die initial cycle of 
ligation and identification. An oligonucleotide probe (202) of the following form is 
employed: 

HO-(3')BBB ... BBB(5')-OP(=0)(a)NH-B* 
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where BBB ... BBB represents the sequence of nucleotides of oligonucleotide probe (202) 
and B,* is a labeled chain-terminating moiety linked to the 5' carbon of the oligonucleotide 
via a phosphoramidate group, or other labile linkage, such as a photocleavable linkage. 
The nature of B * may vary widely. It can be a labeled nucleoside (e.g. coupled via a 

5 5*P3'N phosphoramidate) or other moiety, so long as it prevents successive ligations. It 
may simply be a label connected by a linker, such as described in Agrawal and Tang, 
International application PCT/US9 1/08347. An important feature of the oligonucleotide 
probe is that after annealing and ligation (204), the label may be removed and the 
extendable end regenerated by treating the phosphoramidate linkage with acid, e.g. as 

10 taught by Letsinger et al, J. Am. Chem. Soc, 94: 292-293 (1971); Letsinger et al, 

Biochem., 15: 2810-2816 (1976); Gryaznov et al, Nucleic Acid Research, 20: 3403-3409 
(1992); and like references. By way of example, hydrolysis of the phosphoramidate may be 
accomplished by treatment with 0.8% trifluoroacetic acid in dichloromethane for 40 minutes 
at room temperature. Thus, after annealing, ligating, and identifying the ligated probe via 

15 the label on B *, the chain-terminating moiety is cleaved by acid hydrolysis (206) thereby 
breaking the phosphorus linkage and leaving a 5' monophosphate on the ligated 
oligonucleotide. The steps can be repeated (208) in successive cycles. In one aspect of this 
embodiment, a single initializing oligonucleotide may be employed such that only one 
nucleotide is identified in each sequencing cycle. For such an embodiment, the above probe 

20 preferably has the following form: 

HO-(3')B(5>OP(=0)(0-)NHBB ... BBB-B* 

Thus, after each ligation and acid cleavage steps the duplex will be extended by one 
25 nucleotide. 

A capping step may be introduced prior to hydrolysis. For example, probe (202) 
may have the form: 

HO-(3')BB ... Bp*B ... BB(5 , )-OP(=0)(0)NH-B l * 

30 

where "p~" is a exonuclease resistant linkage, such as phosphorothioate, 
methylphosphonate, or the like. In such an embodiment, capping can be achieved by 
treating the extended duplexes with an exonuclease, such as X exonuclease, which will 
cleave the unligated extended duplexes back to the exonuclease resistant linkage. The 
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presence of this linkage at the 5* end of the extended duplex will then prevent it from 
participating in subsequent ligations. Clearly, many other capping methodologies may be 
employed, e.g. acylation, ligation of an inert oligonucleotide, or the like. When free 3' 
hydroxyls are involved, capping may be accomplished by extending the duplex with a DNA 
5 polymerase in the presence of chain-terminating nucleoside triphosphates, e.g. 
dideoxynucleoside triphosphates, or the like. 

The phosphoramidate linkage described above is an example of a general class of 
internucleosidic linkages referred to herein as "chemically scissile intemucleosidic 
linkages." These are internucleosidic linkages that may be cleaved by treating them with 

10 characteristic chemical or physical conditions, such as an oxidizing environment, a reducing 
environment, light of a characteristic wavelength (for photolabile linkages), or the like. 
Other examples of chemically scissile internucleosidic linkages which may be used in 
accordance with the invention are described in Urdea 5,380,833; Gryaznov et al, Nucleic 
Acids Research, 21: 1403-1408 (1993)(disulfide); Gryaznov et al, Nucleic Acids Research, 

15 22: 2366-2369 (1994) (bromoacetyl); Urdea et al, International application 
PCT/US91/05287 (photolabile); and like references. 

Further chemically scissile linkages that may be employed with the invention 
include chain-terminating nucleotides that may be chemically converted into an extendable 
nucleoside. Examples of such compounds are described in the following references: 

20 Canard et al, International application PCT/FR94/00345; Ansorge, German patent 

application No. DE 4141178 Al; Metzker et al. Nucleic Acids Research, 22: 4259-4267 
(1994); Cheeseman, U.S. patent 5,302,509; Ross et al, International application 
PCT/US90/06178; and the like. 

A scheme for extending an initializing oligonucleotide or an extended duplex in the 

25 5'-> 3' direction is illustrated in Figure 3A. Template (20) is attached to solid phase 

support (10) by its 3' end. As above, this can be conveniently accomplished via a biotin, or 
like linking moiety, using conventional techniques. Initializing oligonucleotide (300) having 
a 3' hydroxyl group is annealed to template (20) as described above prior to the initial cycle 
of ligation and identification. An oligonucleotide probe (302) of the following form is 

30 employed: 



OP(=0)(a)CK5')BBB ... BBBRRRRB* 
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where BBB ... BBBRRRR represents the sequence of 2 , -deoxynucleotides of 
oligonucleotide probe (302), "RRRR" represent a sequence of four ribonucleotides of probe 
(302), and B,* is a labeled chain-terminating moiety, as described above. Such mixed 
RNA-DNA oligonucleotides are readily synthesized using conventional automated DNA 
S synthesizers, e.g. Duck et al, U.S. patent 5,011,769. RNase H will cleave the probe 
specifically in the center of the four ribonucleotide segment, Hogrefe et al, J. Biol. Chem., 
26S: SS61-S566 (1990), leaving a 3' hydroxyl (312) on the extended duplex, which may 
participate in subsequent ligation steps. Thus, a cycle in the present embodiment proceeds 
by annealing probe (302) to template (20) and ligating (304) to form extended duplex (306). 
10 After identification via B,*, the extended duplex is treated with RNase H to cleave the label 
and regenerate an extendable end. The cycle is then repeated (314). Capping (310) can be 
carried out prior to RNase H treatment by extending the unligated ends with a DNA 
polymerase in the presence of the four dideoxynucleoside triphosphates, ddATP, ddCTP, 
ddGTP, and ddTTP. 

15 As illustrated in Figure 3B, a similar scheme can be employed for 3*5* extensions. 

In such an embodiment, initiating oligonucleotide or extended duplex (330) has a 5' 
monophosphate and the oligonucleotide probe (332) has the form: 

HO-(3')BBB ... BBBRRRRB .. BB* 

20 

As above, after annealing, ligating (334), and identifying (338), extended duplex (336) is 
cleaved by RNase H which in this case leaves a 5' monophosphate (342) at the terminus of 
the extended duplex. With the regenerated extendable end, the cycle can be repeated (344). 
A capping step can be included prior to RNase H hydrolysis by either ligating an unlabeled 
25 non-RNA-containing probe, or by removing any remaining 5' monophosphates by treatment 
with a phosphatase. 

Identification of nucleotides can be accomplished by polymerase extension following 
ligation. As exemplified in Figure 4, for this embodiment, template (20) is attached to solid 
phase support (10) as described above and initializing oligonucleotide (400) having a 3' 
30 hydroxyl is annealed to the template prior to the initial cycle. Oligonucleotide probes (402) 
of the form: 



OP(=0)(0)0-(5*)BBB ... BBBRRRRB ... B(3')OP(=0)(0)0 
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are annealed to template (20) and ligated (404) to form extended duplex (406). The 3* 
monophosphate, which prevents successive ligations of probes in the same cycle, is 
removed with phosphatase (408) to expose a free 3' hydroxyl (410). Clearly, alternative 
blocking approaches may also be used. Extended duplex (406) is further extended by a 
5 nucleic acid polymerase in the presence of labeled dideoxynucleoside triphosphates (412), 
thereby permitting the identification of a nucleotide of template (20) by the label of the 
incorporated dideoxynucleotide. The labeled dideoxynucleotide and a portion of probe 
(402) are then cleaved (414), for example, by RNase H treatment, to regenerate an 
extendable end on extended duplex (406). The cycle is then repeated (416). 

10 In order to reduce the number of separate annealing reactions that must be carried 

out, the oligonucleotide probes may be grouped into mixtures, or subsets, of probes whose 
perfectly matched duplexes with complementary sequences have similar stability or free 
energy of binding. Such subsets of oligonucleotide probes having similar duplex stability 
are referred to herein as "stringency classes" of oligonucleotide probes. The mixtures, or 

15 stringency classes, of oligonucleotide probes are then separately combined with the target 
polynucleotide under conditions such that substantially only oligonucleotide probes 
complementary to the target polynucleotide form duplexes. That is, the stringency of the 
hybridization reaction is selected so that substantially only perfectly complementary 
oligonucleotide probes form duplexes. These perfectly matched duplexes are then ligated to 

20 form extended duplexes. For a given oligonucleotide probe length, the number of 
oligonucleotide probes within each stringency class can vary widely. Selection of 
oligonucleotide probe length and stringency class size depends on several factors, such as 
length of target sequence and how it is prepared, the extent to which the hybridization 
reactions can be automated, the degree to which the stringency of the hybridization reaction 

25 can be controlled, the presence or absence of oligonucleotide probes with complementary 
sequences, and the like. Guidance in selecting an appropriate size of stringency class for a 
particular embodiment can be found in the general literature on nucleic acid hybridization 
and polymerase chain reaction methodology, e.g. Gotoh, Adv. biophys. 16: 1-52 (1983); 
Wetmer, Critical Reviews in Biochemistry and Molecular Biology 26: 227-259 (1991); 

30 Breslauer et al, Proc. Natl. Acad. Sci. 83: 3746-3750 (1986); Wolf et al, Nucleic Acids 
Research, 15: 2911-2926 (1987); Innis et al, editors, PCR Protocols (Academic Press, New 
York, 1990); McGraw et al, Biotechniques, 8: 674-678 (1990), and the like. Stringency 
can be controlled by several varying several parameters, including temperature, salt 
concentration, concentration of certain organic solvents, such as formamide, and the like. 
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Preferably, temperature is used to define the stringency classes because the activity of the 
various polymerases or ligases employed limits the degree to which salt concentration or 
organic solvent concentration can be varied for ensuring specific annealing of the 
oligonucleotide probes. 

5 Generally, the larger the stringency class the greater the complexity of the 

hybridizing mixture and the lower the concentration of any particular oligonucleotide probe 
in the mixture. A lower concentration of a oligonucleotide probe having a complementary 
site on a target polynucleotide reduces the relative likelihood of the oligonucleotide probe 
hybridizing and being ligated. This, in turn, leads to reduced sensitivity. Larger stringency 

10 classes also have a greater variance in the stabilities of the duplexes that form between a 
oligonucleotide probe and a complementary sequence. On the other hand, smaller 
stringency classes require a larger number of hybridization reactions to ensure that all 
oligonucleotide probes of a set are hybridized to a target polynucleotide. 

For example, when 8-tner oligonucleotide probes are employed stringency classes 

15 may include between about 50 to about 500 oligonucleotide probes each. Thus, several 
hundred to several thousand hybrization/ligation reactions are required. For larger sized 
oligonucleotide probes, much larger stringency classes are required to make the number of 
hybridization/extension reactions practical, e.g. 10M0 5 , or more. 

Oligonucleotide probes of the same stringency class can be synthesized 

20 simultaneously, in a manner similar to which fully random oligonucleotide probes are 

synthesized, e.g. as disclosed in Telenius et al, Genomics, 13: 718-725 (1992); Welsh et al, 
Nucleic Acids Research, 19: 5275-5279 (1991); Grothues et al, Nucleic Acids Research, 
21: 1321-1322 (1993); Hartley, European patent application 90304496.4; and the like. The 
difference is that at each cycle different mixtures of monomers are applied to the growing 

25 oligonucleotide probe chain, wherein the proportion of each monomer in the mixture is 
dictated by the proportion of each nucleoside at the position of the oligonucleotide probe in 
the stringency class. Stringency classes are readily formed by computing the free energy of 
duplex formation by available algorithms, e.g. Breslauer et al, Proc. Natl. Acad. Sci., 83: 
3746-3750 (1986); Lowe et al, Nucleic Acids Research, 18: 1757-1761 (1990); or the like. 

30 The oligonucleotide probes can be ordered according to the free energy of binding to their 
complement under standard reaction conditions, e.g. with a standard bubble sort, Baase, 
Computer Algorithms (Addison-Wesley, Menlo Park, 1978). For example the following is 
the list of ten 6-mers with the greatest stability (from top to bottom) in terms of free energy 
of duplex formation under standard hybridization conditions and the least stability in terms 
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of free energy of duplex formation (the free energies being computed via Breslauer (cited 
above)): 



Oligonucleotide Probe 

5 Ranking Sequence (5'->3') 

1 GCGCGC 

2 CGCGCG 

3 CCCGCG 
10 4 CGCCCG 

5 CGCGCC 

6 CGCGGC 

7 CGGCGC 

8 GCCGCG 
15 9 GCGCCG 

10 GCGCGG 



20 4087 TCATAT 

4088 TGATAT 

4089 CATATA 

4090 TATATG 

4091 ATCATG 
25 4092 ATGATG 

4093 CATCAT 

4094 CATGAT 

4095 CATATG 
4 096 ATATAT 

30 

Thus, if a stringency class consisted of the first ten 6-mers the mixture monomers 
for the first (3'-most) position would be 0:4:6:0 (A:C:G:T), for the second position it 
would be 0:6:4:0, and so on. If a stringency class consisted of the last ten 6-mers the 
mixture of monomers for the first position would be 1:0:4:5, for the second position it 

35 would be 5:0:0:5, and so on. The resulting mixtures may then be further enriched for 
sequences of the desired stringency class by thermal elution, e.g. Miyazawa et al, J. Mol. 
Biol., 11:223-237 (1965). 

More conveniently, stringency classes containing several hundred to several 
thousands of oligonucleotides may be synthesized directly by a variety of parallel synthesis 

40 approaches, e.g. Frank et al, U.S. patent 4,689,405; Matson et al, Anal. Biochem., 224: 
110-116 (1995); Fodor et al. International application PCTAJS93/04145; Pease et al, Proc. 
Natl. Acad. Sci., 91: 5022-5026 (1994); Southern et al, J. Biotechnology, 35: 217-227 
(1994), Brennan, International application PCT/US94/05896; or the like. 

In some cases it may be desirable to form additional stringency classes of 

45 oligonucleotide probes by placing in a separate subset oligonucleotide probes having 
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complementary sequences to other oligonucleotide probes in a subset or oligonucleotide 
probe that are susceptible of forming oligonucleotide probe-dimers. 

Clearly, one of ordinary skill in the art could combine features of the embodiments 
set forth above to design still further embodiments in accordance with the invention, but not 

5 expressly set forth above. 

The invention also includes systems and apparatus for carrying out the methods of 
the invention automatically. Such systems and apparatus can take a variety of forms 
depending on several design constraints, including i) the nature of the solid phase support 
used to anchor the target polynucleotide, ii) the degree of parallel operation desired, iii) the 

10 detection scheme employed; iv) whether reagents are re-used or discarded, and the like. 
Generally, the apparatus comprises a series of reagent reservoirs, one or more reaction 
vessels containing target polynucleotide, preferably attached to a solid phase support, e.g. 
magnetic beads, one or more detection stations, and a computer controlled means for 
transferring in a predetermined manner reagents from the reagent reservoirs to and from the 

15 reaction vessels and the detection stations. The computer controlled means for transferring 
reagents and controlling temperature can be implemented by a variety of general purpose 
laboratory robots, such as that disclosed by Harrison et al, Biotechniques, 14: 88-97 (1993); 
Fujita et al, Biotechniques, 9: 584-591 (1990); Wada et al, Rev. Sci. Instrum., 54: 1569- 
1572 (1983); or the like. Such laboratory robots are also available commercially, e.g. 

20 Applied Biosystems model 800 Catalyst (Foster City, CA). 

A variety of kits may be provided for carrying out different embodiments of the 
invention. Generally, kits of the invention include oligonucleotide probes, initializing 
oligonucleotides, and a detection system. Kits further include ligation reagents and 
instructions for practicing the particular embodiment of the invention. In embodiments 

25 employing protein ligases, RNase H, nucleic acid polymerases, or other enzymes, their 
respective buffers may be included. In some cases, these buffers may be identical. 
Preferably, kits also include a solid phase support, e.g. magnetic beads, for anchoring 
templates. In one preferred kit, fluorescently labeled oligonucleotide probes are provided 
such that probes corresponding to different terminal nucleotides of the target polynucleotide 

30 carry distinct spectrally resolvable fluorescent dyes. As used herein, "spectrally resolvable" 
means that the dyes may be distinguished on basis of their spectral characteristics, 
particularly fluorescence emission wavelength, under conditions of operation. Thus, the 
identity of the one or more terminal nucleotides would be correlated to a distinct color, or 
perhaps ratio of intensities at different wavelengths. More preferably, four such probes are 
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provided that allow a one-to-one correspondence between each of four spectrally resolvable 
fluorescent dyes and die four possible terminal nucleotides on a target polynucleotide. Sets 
of spectrally resolvable dyes are disclosed in U.S. patents 4,855,225 and 5,188,934; 
International application PCT/US90/05565; and Lee et al. Nucleic Acids Research, 20: 
5 2471-2483 (1992). 

Example 1 

Sequencing a Target Polynucleotide Amplified from dUC19 
with Four Initializing Oligonucleotides 
10 In this example, a template comprising a binding region and a portion of the pUC19 

plasmid is amplified by PCR and attached to magnetic beads. Four initializing 
oligonucleotides are employed in separate reactions as indicated below. 8-mer 
oligonucleotide probes are employed having 4 central ribonucleotides and both 5' and 3' 
monophosphates, as shown in the following formula: 

15 

OP(= OX^OCS^BBRRRRBBP'HDPC = 0)(0 )0 

After annealing, probes are enzymatically ligated to the initializing oligonucleotides and the 
magnetic bead supports are washed. The 3' phosphates of the ligated probes are removed 
20 with phosphatase, after which the probes are extended with DNA polymerase in the 
presence of the four labeled dideoxynucleoside triphosphate chain terminators. After 
washing and identification of the extended nucleotide, the ligated probes are cleaved at the 
ribonucleotide moiety with RNAse H to remove the label and to regenerate an extendable 
end. 

25 The following double stranded fragment comprising a 36-mer binding region is 

ligated into a Sac I/Xma I-digested pUC19: 

CCTCTCCCTTCCCTCTCCTCCCTCTCCCCTCTCCCTC 
TCGAGGAGAGGGAAGGGAGAGGAGGGAGAGGGGAGAGGGAGGGCC 

30 

After isolation and amplification, a 402 basepair fragment of the modified pUC19 is 
amplified by PCR for use as a template. The fragment spans a region of pUC19 from 
position 41 to the binding region inserted adjacent to the Sac I site in the polylinker region 
35 (position 413 of the unmodified pUC19), Yanisch-Perron et al, Gene, 33: 103-1 19 (1985). 
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Two 18-mer oligonucleotide probes are employed having sequences 5'- 
CCCTCTCCCCTCTCCCTCx-3' and 5'-GCAGCTCCCGG AG ACGGT-3 ' , where "x" is a 
3* biotin moiety is attached during synthesis using a commercially available reagent with 
manufacturer's protocol, e.g. 3* Biotin-ON CPG (Clontech Laboratories, Palo Alto, 
5 California). The amplified template is isolated and attached to streptavidin-coated magnetic 
beads (Dynabeads) using manufacturer's protocol, Dynabeads Template Preparation Kit, 
with M280-streptavidin (Dynal, Inc., Great Neck, New York). A sufficient quantity of the 
biotinylated 313 basepair fragment is provided to load about 300 g of Dynabeads M280- 
Streptavidin. 

10 The binding region sequence is chosen so that the duplexes formed with the 

initiating oligonucleotides have compositions of about 66% GC to enhance duplex stability. 
The sequence is also chosen to prevent secondary structure formation and fortuitous 
hybridization of an initializing oligonucleotide to more than one location within the binding 
region. Any shifting of position of a given initializing oligonucleotide within the binding 

IS region results in a significant number of mis-matched bases. 

After loading, the non-biotinylated strand of template is removed by heat 
denaturation, after which the magnetic beads are washed and separated into four aliquots. 
The template attached to the magnetic beads has the following sequence: 

20 (Magnetic bead)-ainker)-(3 , )-CTCCCTCTCCCCTCTCCCTCCTC- 

TCCCTTCCTCTCCTCGAGCTTAAGT ... CTCGACG-(5') 

The following four oligonucleotides are employed as initializing oligonucleotides in 
each of the separate aliquots of template: 

25 

5'-GAGGAGAGGGAAGGAGAGGAG 
5'GGAGGAGAGGGAAGGAGAGGA 
30 5'-GGGAGGAGAGGGAAGGAGAGG 



5*-AGGGAGGAGAGGGAAGGAGAG 
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Reactions and washes below are generally carried out in 50 L volumes of manufacturer's 
(New England Biolabs) recommended buffers for the enzymes employed, unless otherwise 
indicated. Standard buffers are also described in Sambrook et al, Molecular Cloning, 2nd 
Edition (Cold Spring Harbor Laboratory Press, 1989). 
5 96 stringency classes of 684 or 682 oligonucleotide probes each (2 subsets for each of 

48 different annealing temperatures) are formed which together contain all 8-mer probes for 
each of the four aliquots. The probes of each of the 96 classes are separately annealed to 
the target polynucleotide in reaction mixtures having the same components, with the 
exception that extensions and ligations carried out with Sequenase and T4 DNA ligase at 

10 temperatures less than 37°C and extensions and ligations carried out with Taq Stoffel 
fragment and a thermostable ligase otherwise. 

The 48 stringency conditions are defined by annealing temperatures which range from 
22°C to 70°C, such that each grouping of subsets at the same temperature differ in annealing 
temperature by 1*C from that of the subset groupings containing the next highest and next 

15 lowest stringency classes. The range of annealing temperatures (22-70°C) is roughly 

bounded by the temperatures 5-10 degrees below the temperatures at which the least stable 
and most stable 8-mers, respectively, are expected to have about fifty percent maximum 
annealing in a standard PCR buffer solution. 

After 5-10 minutes incubation at 80°C, the reaction mixtures are brought down to 

20 their respective annealing temperatures over a period of 20-30 minutes. After ligation, 
washing and treatment with phosphatase, 2 units of polymerase and labeled 
dideoxynucleotide triphosphates (.08 mM final reaction concentration and labeled with 
TAMRA (tetramethylrhodamine), FAM (fluorescein), ROX (rhodamine X), and JOE (2\7'- 
dimethoxy-4\5'-dichlorofluorescein)) are added. After 15 minutes, the beads are washed 

25 with H 2 0 and the identity of the extended nucleotide is determined by illuminating each 
reaction mixture with standard wavelengths, e.g Users Manual, model 373 DNA Sequencer 
(Applied Biosystems, Foster City, CA). 

After identification, the reaction mixtures are treated with RNase H using the 
manufacturer's suggested protocol and washed. The RNase H treated extended duplexes 

30 have regenerated 3' hydroxyls and are ready for the next cycle of ligation/extension/ 
cleavage. The cycles are carried out until alt the nucleotides of the test sequence are 
identified. 
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Example 2 

Sequencing a Target Polynucleotide Amplified from dUC19 
with One Initializing Oligonucleotide 
In this example, a template is prepared in accordance with Example 1, except that 
5 since extension is in the 5'-> 3' direction in this example, the biotin moiety is attached to 
the 5' end of the primer hybridizing to the CT-rich strand of the binding region. Thus, in 
this example, the binding region of the single stranded template will be a GA-rich segment 
(essentially the complement of the binding region of Example I). Two 18-mer 
oligonucleotide probes are employed having sequences 5*-xGAGGGAGAGGGGAGAGGG- 
10 3' and 5'-ACCGTCTCCGGGAGCTGC-3\ where "x" is a 5* biotin moiety is attached 
during synthesis using commercially available reagents with manufacturers' protocols, e.g. 
the Aminolink aminoalkylphosphoramidite linking agent (Applied Biosystems, Foster City, 
California) and Biotin-X-NHS Ester available form Clontech Laboratories (Palo Alto, 
California). 

15 A single 21-mer initializing oligonucleotide is employed with the following sequence: 

5'-OP(=0)(a)0-CCTCTCCOTCCCTCTCCTCC-3' 

6-mer oligonucleotide probes are employed that have an acid labile phosphoramidate linkage 
20 between the 3'-most nucleoside and 3'-penultimate nucleoside of the probe, as shown in the 
following formula: 

HO-(3')B(5>OP(=0)(0)NH-(3 , )BBBBB,* 

25 where B,* is a JOE-, FAM-, TAMRA-, or ROX-labeled dideoxynucleoside, such that the 
label corresponds to the identity of the 3'-most nucleotide (so 16 different labeled 
dideoxynucleosides are used in the synthesis of the probes). 

As above, the 6-mer probes are prepared in 96 stringency classes of 42 or 43 probes 
each (2 subsets for each of 48 different annealing temperatures). Hybridizations and 

30 ligations are carried out as described above. After ligation and washing, a nucleoside in the 
target polynucleotide is identified by the fluorescent signal of the oligonucleotide probe. 
Acid cleavage is then carried out by treating the extended duplex with 0.8% trifluoroacetic 
acid in dichloromethane for 40 minutes at room temperature to regenerate an extendable end 
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on the extended duplex. The process continues until the sequence of the target 
polynucleotide is determined. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 
5 (i) APPLICANT: 

(ii) TITLE OF INVENTION: 

DNA Sequencing by Stepwise Extension with 
Oligonucleotide Blocks 

10 

(iii) NUMBER OF SEQUENCES: 8 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Dehlinger & Associates 
15 (B) STREET: P.O. Box 60850 

(C) CITY: Palo Alto, CA 

(D) STATE: California 

(E) COUNTRY: USA 

(F) ZIP: 94306-1546 

20 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: 3.5 inch diskette 

(B) COMPUTER: IBM compatible 

(C) OPERATING SYSTEM: Windows 3.1/DOS 5.0 

25 (D) SOFTWARE: Microsoft Word for Windows, vers. 2.0 

(Vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

30 (C) CLASSIFICATION: 

(Vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

35 

(viii) ATTORNEY / AG ENT INFORMATION: 

(A) NAME: Vincent M. Powers 

(B) REGISTRATION NUMBER: 36,246 

(C) REFERENCE/DOCKET NUMBER: peol 

40 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 415-324-0880 

(B) TELEFAX: 415-324-0960 

45 

(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 37 nucleotides 
50 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

55 

CCTCTCCCTT CCCTCTCCTC CCTCTCCCCT CTCCCTC 37 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 21 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 



GAGGAGAGGG AAGGAGAGGA G 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 nucleotides 
20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

25 

GGAGGAGAGG GAAGGAGAGG A 



(2) INFORMATION FOR SEQ ID NO: 4: 

30 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

GGGAGGAGAG GGAAGGAGAG G 

40 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 
45 (A) LENGTH: 21 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

50 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
AGGGAGGAGA GGGAAGGAGA G 



55 (2) INFORMATION FOR SEQ ID NO: 6: 



PCMJS96/05245 

WO 96/33205 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
5 (D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 6: 
GAGGGAGAGG GGAGAGGG 

10 

(2) INFORMATION FOR SEQ ID NO: 7: 

M) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 18 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

20 (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
ACCGTCTCCG GGAGCTGC 

25 (2) INFORMATION FOR SEQ ID NO: 8: 

fi) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 nucleotides 

(B) TYPE: nucleic acid 

30 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
35 CCTCTCCCTT CCCTCTCCTC C 
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I claim: 

1. A method for identifying a sequence of nucleotides in a polynucleotide, the 
method comprising the steps of: 
5 (a) extending an initializing oligonucleotide along the polynucleotide by ligating an 

oligonucleotide probe thereto to form an extended duplex; 

(b) identifying one or more nucleotides of the polynucleotide; and 

(c) repeating steps (a) and (b) until the sequence of nucleotides is determined. 

10 2. The method of claim 1 wherein said oligonucleotide probe has a chain- 

terminating moiety at a terminus distal to said initializing oligonucleotide. 

3. The method of claim 2 wherein said step of identifying includes removing 
said chain-terminating moiety and extending said oligonucleotide probe with a nucleic acid 

15 polymerase in the presence of one or more labeled chain-terminating nucleoside 
triphosphates. 

4. The method of claim 3 further including a step of regenerating an 
extendable terminus on said extended duplex. 

20 

5. The method of claim 4 wherein said oligonucleotide probe includes a 
subsequence of four ribonucleotides and wherein said step of regenerating includes cleaving 
said oligonucleotide probe with RNase H. 

25 6. The method of claim 5 wherein said chain-terminating moiety is a 3* 

phosphate. 

7. The method of claim 2 further including a step of capping an extended 
duplex or said initializing oligonucleotide whenever the extended duplex or said initializing 
30 oligonucleotide fails to ligate to said oligonucleotide probe. 



8. The method of claim 2 further including a step of regenerating an 
extendable terminus on said extended duplex. 
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9. The method of claim 8 wherein said step of regenerating includes cleaving a 
chemically scissile internucleosidic linkage in said extended duplex. 

10. The method of claim 9 wherein said chemically scissile internucleosidic 
5 linkage is a phosphoramidate. 

1 1 . The method of claim 8 wherein said step of regenerating includes 
enzymatically cleaving an internucleosidic linkage in said extended duplex. 

10 12. The method of claim 1 1 wherein said oligonucleotide probe includes a 

subsequence of four ribonucleotides and wherein said step of regenerating includes cleaving 
said oligonucleotide probe with RNase H. 

13. A method for determining the nucleotide sequence of a polynucleotide, the 
IS method comprising the steps of: 

(a) providing a template comprising the polynucleotide; 

(b) providing an initializing oligonucleotide which forms a duplex with the template 
adjacent to the polynucleotide; 

(c) annealing an oligonucleotide probe to the 
20 template adjacent to the initializing oligonucleotide; 

(d) ligating the oligonucleotide probe to the initializing oligonucleotide to form 
an extended duplex; 

(e) identifying one or more nucleotides of the polynucleotide by a label on the 
ligated oligonucleotide probe; and 

25 (f) repeating steps (c) through (e) until the nucleotide sequence of the polynucleotide 

is determined. 

14. The method of claim 13 wherein said oligonucleotide probe has a chain- 
terminating moiety at a terminus distal to said initializing oligonucleotide and wherein said 

30 method further includes a step of regenerating an extendable terminus on said 
oligonucleotide probe. 

15. The method of claim 14 further including a step of capping said extended 
duplex or said initializing oligonucleotide that fails to ligate to said oligonucleotide probe. 
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16. The method of claim 14 wherein said step of identifying consists of 
identifying a single nucleotide of said polynucleotide. 

17. The method of claim 16 wherein said step of identifying includes removing 
5 said chain-terminating moiety and extending said oligonucleotide probe with a nucleic acid 

polymerase in the presence of one or more labeled chain-terminating nucleoside 
triphosphates. 



18. An oligonucleotide probe of the formula: 



10 



HO-(3')(B) i (5')-OP(=0)(a)NH-(B) k -B* 



wherein: 



15 



B is a nucleotide or an analog thereof; 
j is in the range of from 1 to 12; 

k is in the range of from 0 to 12, such that the sum of j and k is less than or equal 



to 12; 



B * is a labeled chain-terminating moiety. 



19. An oligonucleotide probe selected from the group consisting of: 



20 



OP(= 0)(0 )0-(5')(B).RRRR(B)JB * 



HO-(3')(B),RRRR(B)^* 



25 



and 



OP(=0)(a)0-(5')(B),RRRR(B) w (3 , )OP(=0)(0)0 



30 



wherein: 



B is a deoxyribonucleotide or an analog thereof; 

R is a ribonucleotide; 

s is in the range of from 1 to 8; 



S/33205 PCT/US96/05245 
30 

w is in the range of from 0 to 8, such that the sum of j and k is less than or equal 

to 8; 

B * is a labeled chain-terminating moiety. 
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